---
title: Configuring PXF for Secure HDFS
---

When Kerberos is enabled for your HDFS filesystem, the PXF Service, as an HDFS client, requires a principal and keytab file to authenticate access to HDFS. To read or write files on a secure HDFS, you must create and deploy Kerberos principals and keytabs for PXF, and ensure that Kerberos authentication is enabled and functioning.

PXF supports simultaneous access to multiple Kerberos-secured Hadoop clusters.

When Kerberos is enabled, you access Hadoop with the PXF principal and keytab. You can also choose to access Hadoop using the identity of the Greenplum Database user.

You configure the impersonation setting and the Kerberos principal and keytab for a Hadoop server via the `pxf-site.xml` server-specific configuration file. Refer to [About the pxf-site.xml Configuration File](cfg_server.html#pxf-site) for more information about the configuration properties in this file.

Configure the Kerberos principal and keytab using the following `pxf-site.xml` properties:

| Property       | Description                                | Default Value |
|----------------|--------------------------------------------|---------------|
| pxf.service.kerberos.principal | The Kerberos principal name. | gpadmin/\_HOST@EXAMPLE.COM |
| pxf.service.kerberos.keytab | The file system path to the Kerberos keytab file. | $PXF_BASE/keytabs/pxf.service.keytab |

## <a id="prereq"></a>Prerequisites

Before you configure PXF for access to a secure HDFS filesystem, ensure that you have:

- Configured a PXF server for the Hadoop cluster, and can identify the server configuration name.

- Configured and started PXF as described in [Configuring PXF](instcfg_pxf.html).

- Verified that Kerberos is enabled for your Hadoop cluster.

- Verified that the HDFS configuration parameter `dfs.block.access.token.enable` is set to `true`. You can find this setting in the `hdfs-site.xml` configuration file on a host in your Hadoop cluster.

- Noted the host name or IP address of each Greenplum Database segment host (\<seghost\>) and the Kerberos Key Distribution Center \(KDC\) \<kdc-server\> host.

- Noted the name of the Kerberos \<realm\> in which your cluster resides.

- Installed the Kerberos client packages on **each** Greenplum Database segment host if they are not already installed. You must have superuser permissions to install operating system packages. For example:

    ``` shell
    root@gphost$ rpm -qa | grep krb
    root@gphost$ yum install krb5-libs krb5-workstation
    ```

## <a id="scenarios"></a>Use Cases and Configuration Scenarios

The following scenarios describe the use cases and configuration required when
you use PXF to access a Kerberos-secured Hadoop cluster.

**Note**: These scenarios assume that `gpadmin` is the PXF process owner.

### <a id="principal_proxy"></a>Accessing Hadoop as the Greenplum User Proxied by the Kerberos Principal

In this configuration, PXF accesses Hadoop as the Greenplum user proxied
by the Kerberos principal. The Kerberos principal is the Hadoop proxy user and
accesses Hadoop as the Greenplum user.

<img alt="Accessing Hadoop as the Greenplum User Proxied by the Kerberos Principal" src="graphics/impersonation_cases/impersonation-case-1-kerberos.png" class="image" />

The following table identifies the `pxf.service.user.impersonation` and `pxf.service.user.name` settings, and the PXF and Hadoop configuration required for this use case:

| Impersonation  | Service&nbsp;User |  PXF Configuration | Hadoop&nbsp;Configuration&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; |
|----------------|-------------------|--------------------|-----------|
|  true          | Kerberos principal | Perform the [Configuration Procedure](#procedure) in this topic. | Set the Kerberos principal as the Hadoop proxy user as described in [Configure Hadoop Proxying](pxfuserimpers.html#hadoop). |

### <a id="principal_hadoop"></a>Accessing Hadoop as the Kerberos Principal

In this configuration, PXF accesses Hadoop as the Kerberos principal. A query initiated by
any Greenplum user appears on the Hadoop side as originating from the Kerberos principal.

<img alt="Accessing Hadoop as the Kerberos Principal" src="graphics/impersonation_cases/impersonation-case-2-kerberos.png" class="image" />

The following table identifies the `pxf.service.user.impersonation` and `pxf.service.user.name` settings, and the PXF and Hadoop configuration required for this use case:

| Impersonation  | Service&nbsp;User |  PXF Configuration | Hadoop&nbsp;Configuration&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; |
|----------------|-------------------|--------------------|-----------|
|  false         | Kerberos principal | Perform the [Configuration Procedure](#procedure.html) in this topic, and then turn off user impersonation as described in [Configure PXF User Impersonation](pxfuserimpers.html#pxf_cfg_impers). | None required. |

### <a id="custom_hadoop"></a>Accessing Hadoop as a &lt;custom> User

In this configuration, PXF accesses Hadoop as a \<custom> user (for example, `hive`).
The Kerberos principal is the Hadoop proxy user. A query initiated by any Greenplum
user appears on the Hadoop side as originating from the \<custom> user.

<img alt="Accessing Hadoop as a custom User" src="graphics/impersonation_cases/impersonation-case-3-kerberos.png" class="image" />

The following table identifies the `pxf.service.user.impersonation` and `pxf.service.user.name` settings, and the PXF and Hadoop configuration required for this use case:

| Impersonation  | Service&nbsp;User |  PXF Configuration | Hadoop&nbsp;Configuration&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; |
|----------------|-------------------|--------------------|-----------|
|  false         |  \<custom>        | Perform the [Configuration Procedure](#procedure) in this topic, turn off user impersonation as described in [Configure PXF User Impersonation](pxfuserimpers.html#pxf_cfg_impers), and [Configure the Hadoop User](pxfuserimpers.html#pxf_cfg_user) to the \<custom> user name. | Set the Kerberos principal as the Hadoop proxy user as described in [Configure Hadoop Proxying](pxfuserimpers.html#hadoop). |

**Note:** PXF does not support accessing a Kerberos-secured Hadoop cluster
with a &lt;custom> user impersonating Greenplum users. PXF requires that you
impersonate Greenplum users using the Kerberos principal.

## <a id="procedure"></a>Procedure

There are different procedures for configuring PXF for secure HDFS with a [Microsoft Active Directory KDC Server](#proc_ad) vs. with an [MIT Kerberos KDC Server](#proc_mit).


### <a id="proc_ad"></a>Configuring PXF with a Microsoft Active Directory Kerberos KDC Server

When you configure PXF for secure HDFS using an AD Kerberos KDC server, you will perform tasks on both the KDC server host and the Greenplum Database master host.

**Perform the following steps the Active Directory domain controller**:

1. Start **Active Directory Users and Computers**.
2. Expand the forest domain and the top-level UNIX organizational unit that describes your Greenplum user domain.
3. Select **Service Accounts**, right-click, then select **New->User**.
4. Type a name, eg. `ServiceGreenplumPROD1`, and change the login name to `gpadmin`. Note that the login name should be in compliance with POSIX standard and match `hadoop.proxyuser.<name>.hosts/groups` in the Hadoop `core-site.xml` and the Kerberos principal.
5. Type and confirm the Active Directory service account password. Select the **User cannot change password** and **Password never expires** check boxes, then click **Next**. For security reasons, if you can't have **Password never expires** checked, you will need to generate new keytab file (step 7) every time you change the password of the service account. 
6. Click **Finish** to complete the creation of the new user principal. 
7. Open Powershell or a command prompt and run the `ktpass` command to generate the keytab file. For example:

    ``` shell
    powershell#>ktpass -out pxf.service.keytab -princ gpadmin@EXAMPLE.COM -mapUser ServiceGreenplumPROD1 -pass ******* -crypto all -ptype KRB5_NT_PRINCIPAL
    ```

    With Active Directory, the principal and the keytab file are shared by all Greenplum Database segment hosts. 
	
8. Copy the `pxf.service.keytab` file to the Greenplum master host.

**Perform the following procedure on the Greenplum Database master host**:

1. Log in to the Greenplum Database master host. For example:

    ``` shell
    $ ssh gpadmin@<gpmaster>
    ```
    
2. Identify the name of the PXF Hadoop server configuration, and navigate to the server configuration directory. For example, if the server is named `hdp3`:

    ```shell
    gpadmin@gpmaster$ cd $PXF_BASE/servers/hdp3
    ```

3. If the server configuration does not yet include a `pxf-site.xml` file, copy the template file to the directory. For example:

    ``` shell
    gpadmin@gpmaster$ cp $PXF_HOME/templates/pxf-site.xml .
    ```

4. Open the `pxf-site.xml` file in the editor of your choice, and update the keytab and principal property settings, if required. Specify the location of the keytab file and the Kerberos principal, substituting your realm. For example:

    ``` xml
    <property>
        <name>pxf.service.kerberos.principal</name>
        <value>gpadmin@EXAMPLE.COM</value>
    </property>
    <property>
        <name>pxf.service.kerberos.keytab</name>
        <value>${pxf.conf}/keytabs/pxf.service.keytab</value>
    </property>
    ```

5. Enable user impersonation as described in [Configure PXF User Impersonation](pxfuserimpers.html#pxf_cfg_impers), and configure or verify Hadoop proxying for the *primary* component of the Kerberos principal as described in [Configure Hadoop Proxying](pxfuserimpers.html#hadoop). For example, if your principal is `gpadmin@EXAMPLE.COM`, configure proxying for the Hadoop user `gpadmin`.

4. Save the file and exit the editor.

5. Synchronize the PXF configuration to your Greenplum Database cluster and restart PXF:

    ``` shell
    gpadmin@master$ pxf cluster sync
    gpadmin@master$ pxf cluster restart
    ```

6. Step 7 does not synchronize the keytabs in `$PXF_BASE`. You must distribute the keytab file to `$PXF_BASE/keytabs/`. Locate the keytab file, copy the file to the `$PXF_BASE` runtime configuration directory, and set required permissions. For example:
	
    ``` shell
    gpadmin@gpmaster$ gpscp -f hostfile_all pxf.service.keytab =:$PXF_BASE/keytabs/
    gpadmin@gpmaster$ gpssh -f hostfile_all chmod 400 $PXF_BASE/keytabs/pxf.service.keytab
    ```


### <a id="proc_mit"></a>Configuring PXF with an MIT Kerberos KDC Server

When you configure PXF for secure HDFS using an MIT Kerberos KDC server, you will perform tasks on both the KDC server host and the Greenplum Database master host.

**Perform the following steps on the MIT Kerberos KDC server host**:

1.  Log in to the Kerberos KDC server as the `root` user.

    ``` shell
    $ ssh root@<kdc-server>
    root@kdc-server$ 
    ```

2. Distribute the `/etc/krb5.conf` Kerberos configuration file on the KDC server host to **each** segment host in your Greenplum Database cluster if not already present. For example:

    ``` shell
    root@kdc-server$ scp /etc/krb5.conf seghost:/etc/krb5.conf
    ```

3.  Use the `kadmin.local` command to create a Kerberos PXF Service principal for **each** Greenplum Database segment host. The service principal should be of the form `gpadmin/<seghost>@<realm>` where \<seghost\> is the DNS resolvable, fully-qualified hostname of the segment host system \(output of the `hostname -f` command\).

    For example, these commands create Kerberos PXF Service principals for the hosts named host1.example.com, host2.example.com, and host3.example.com in the Kerberos realm named `EXAMPLE.COM`:

    ``` shell
    root@kdc-server$ kadmin.local -q "addprinc -randkey -pw changeme gpadmin/host1.example.com@EXAMPLE.COM"
    root@kdc-server$ kadmin.local -q "addprinc -randkey -pw changeme gpadmin/host2.example.com@EXAMPLE.COM"
    root@kdc-server$ kadmin.local -q "addprinc -randkey -pw changeme gpadmin/host3.example.com@EXAMPLE.COM"
    ```

4.  Generate a keytab file for each PXF Service principal that you created in the previous step. Save the keytab files in any convenient location (this example uses the directory `/etc/security/keytabs`). You will deploy the keytab files to their respective Greenplum Database segment host machines in a later step. For example:

    ``` shell
    root@kdc-server$ kadmin.local -q "xst -norandkey -k /etc/security/keytabs/pxf-host1.service.keytab gpadmin/host1.example.com@EXAMPLE.COM"
    root@kdc-server$ kadmin.local -q "xst -norandkey -k /etc/security/keytabs/pxf-host2.service.keytab gpadmin/host2.example.com@EXAMPLE.COM"
    root@kdc-server$ kadmin.local -q "xst -norandkey -k /etc/security/keytabs/pxf-host3.service.keytab gpadmin/host3.example.com@EXAMPLE.COM"
    ```

    Repeat the `xst` command as necessary to generate a keytab for each PXF Service principal that you created in the previous step.

5.  List the principals. For example:

    ``` shell
    root@kdc-server$ kadmin.local -q "listprincs"
    ```

6.  Copy the keytab file for each PXF Service principal to its respective segment host. For example, the following commands copy each principal generated in step 4 to the PXF default keytab directory on the segment host when `PXF_BASE=/usr/local/greenplum-pxf`:

    ``` shell
    root@kdc-server$ scp /etc/security/keytabs/pxf-host1.service.keytab host1.example.com:/usr/local/greenplum-pxf/keytabs/pxf.service.keytab
    root@kdc-server$ scp /etc/security/keytabs/pxf-host2.service.keytab host2.example.com:/usr/local/greenplum-pxf/keytabs/pxf.service.keytab
    root@kdc-server$ scp /etc/security/keytabs/pxf-host3.service.keytab host3.example.com:/usr/local/greenplum-pxf/keytabs/pxf.service.keytab
    ```

    Note the file system location of the keytab file on each PXF host; you will need this information for a later configuration step.

7. Change the ownership and permissions on the `pxf.service.keytab` files. The files must be owned and readable by only the `gpadmin` user. For example:

    ``` shell 
    root@kdc-server$ ssh host1.example.com chown gpadmin:gpadmin /usr/local/greenplum-pxf/keytabs/pxf.service.keytab
    root@kdc-server$ ssh host1.example.com chmod 400 /usr/local/greenplum-pxf/keytabs/pxf.service.keytab
    root@kdc-server$ ssh host2.example.com chown gpadmin:gpadmin /usr/local/greenplum-pxf/keytabs/pxf.service.keytab
    root@kdc-server$ ssh host2.example.com chmod 400 /usr/local/greenplum-pxf/keytabs/pxf.service.keytab
    root@kdc-server$ ssh host3.example.com chown gpadmin:gpadmin /usr/local/greenplum-pxf/keytabs/pxf.service.keytab
    root@kdc-server$ ssh host3.example.com chmod 400 /usr/local/greenplum-pxf/keytabs/pxf.service.keytab
    ```

**Perform the following steps on the Greenplum Database master host**:

1. Log in to the master host. For example:

    ``` shell
    $ ssh gpadmin@<gpmaster>
    ```

2. Identify the name of the PXF Hadoop server configuration that requires Kerberos access.

3. Navigate to the server configuration directory. For example, if the server is named `hdp3`:

    ```shell
    gpadmin@gpmaster$ cd $PXF_BASE/servers/hdp3
    ```

4. If the server configuration does not yet include a `pxf-site.xml` file, copy the template file to the directory. For example:

    ``` shell
    gpadmin@gpmaster$ cp $PXF_HOME/templates/pxf-site.xml .
    ```

5. Open the `pxf-site.xml` file in the editor of your choice, and update the keytab and principal property settings, if required. Specify the location of the keytab file and the Kerberos principal, substituting your realm. *The default values for these settings are identified below*:

    ``` xml
    <property>
        <name>pxf.service.kerberos.principal</name>
        <value>gpadmin/_HOST@EXAMPLE.COM</value>
    </property>
    <property>
        <name>pxf.service.kerberos.keytab</name>
        <value>${pxf.conf}/keytabs/pxf.service.keytab</value>
    </property>
    ```
    
    PXF automatically replaces ` _HOST` with the FQDN of the segment host.

6. If you want to access Hadoop as the Greenplum Database user:

    1. Enable user impersonation as described in [Configure PXF User Impersonation](pxfuserimpers.html#pxf_cfg_impers).
    2. Configure Hadoop proxying for the *primary* component of the Kerberos principal as described in [Configure Hadoop Proxying](pxfuserimpers.html#hadoop). For example, if your principal is `gpadmin/_HOST@EXAMPLE.COM`, configure proxying for the Hadoop user `gpadmin`.

7. If you want to access Hadoop using the identity of the Kerberos principal, disable user impersonation as described in [Configure PXF User Impersonation](pxfuserimpers.html#pxf_cfg_impers).

8. If you want to access Hadoop as a custom user:

    1. Disable user impersonation as described in [Configure PXF User Impersonation](pxfuserimpers.html#pxf_cfg_impers).
    2. Configure the custom user name as described in [Configure the Hadoop User](pxfuserimpers.html#pxf_cfg_user).
    3. Configure Hadoop proxying for the *primary* component of the Kerberos principal as described in [Configure Hadoop Proxying](pxfuserimpers.html#hadoop). For example, if your principal is `gpadmin/_HOST@EXAMPLE.COM`, configure proxying for the Hadoop user `gpadmin`.

8. Save the file and exit the editor.

9. Synchronize the PXF configuration to your Greenplum Database cluster:

    ``` shell
    gpadmin@seghost$ pxf cluster sync
    ```

